Your request cart is empty!
Dataset Description
17:10:26 Hours | 11 GB speech data | 61 Speakers | 12,036 Audio segments | 48 kHz | 16 bit wav.
Dogri,
the language of the Dogras, belongs to the Indo-Aryan group and is the first
major language of the multi-lingual region i. e. Jammu of the Jammu & Kashmir
state. It derives its name from ‘Duggar’ the ancient title of this region. Dogri
is a morphologically rich language having the pre-dominant word order of
Subject-Object-Verb (SOV) with a flexibility to rearrange the constituents as
many Indian languages allow.
Dogri had its own script namely “Dogare Akkhar”or “Dogare” based on Takri script which is closely related to the Sharada script employed by Kashmiri language. This script was the official language script during the regime of Maharaja Ranbir Singh (1857-1885 AD). After the independence, the state government constituted a committee on 29th October, 1953 headed by Sh. Girdhari Lal Dogra. The committee presented a report and accordingly the state government decided to adopt Devanagari as well as Persian script for Dogri and it was incorporated in the State Constitution in 1957.
The LDC-IL speech data is collected from Jammu, from both the genders and different age groups. The LDC-IL Dogri Speech data set consists of different types of datasets that are made up of words, sentences, running texts and date formats. Each speaker recorded these datasets which are randomly selected from a master dataset.
The available Speech
Corpus details:
Total Speakers 61 (30 Female and 31 Male)
Domains |
Audio Segments |
Each Domain Duration |
Contemporary
Text (News) |
60 |
4:27:51 |
Creative
Text |
61 |
2:51:42 |
Sentence |
1527 |
1:24:48 |
Date
Format |
122 |
0:14:07 |
Command
and Control Words |
1830 |
1:24:31 |
Person
Name |
1222 |
1:23:41 |
Place
Name |
609 |
0:29:10 |
Most
Frequent Word - Part |
1831 |
1:18:06 |
Most
Frequent Word - Full Set |
2000 |
1:16:27 |
Phonetically
Balanced |
2050 |
1:50:38 |
Form
and Function - Word |
724 |
0:29:25 |
A detailed explanation of the Dogri Speech Corpus will be available in the Dogri Raw Speech Documentation.
For any research-based citations, please use the following citations:
- Narayan Kumar Choudhary, Sunil Kumar Choudhary, Rajesha N.,ManasaG., 2021. Dogri Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
Item specifics
- Authors Narayan Kumar Choudhary, Sunil Kumar, Rajesha N., Manasa G.
- Corpus Type Raw Corpus
- Catalogue Number 1275
- ISBN 978-81-948885-7-4
- Data Source On Field
- Duration 17:10:26
- # of Audio Segments 12036
- Release Date 15-Jun-2021
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.